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ABSTRACT 

This meta-analysis explored how measuring student 
progress toward long vs. short-term goals affects achievement 
outcomes. Twenty-one controlled studies were coded in terms of 
measuring method (toward long- vs. short-term goals) and type of 
achievement outcome (probe-like vs. global achievement test). 
Analogues to analysis of variance conducted on weighced unbiased 
effect sizes (UES) indicated an interaction: when progress was 
measured toward long-term goals, UESs on global measures were higher 
than on probe-like outcomes; when progress was measured toward series 
of short-term goals, the reverse was true. Implications for special 
education practice are discussed. (Author) 
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Abstract 

This meta-analysis explored how measuring student progress toward long- 
vs. short-term goals affects achievement outcomes. Twenty-one 
controlled studies were coded in terms of measuring method (toward 
long- Ms • short-term goals) and type of achievement outcome (probe-like 
vs. global achievement test). Analogues to analysis of variance 
conducted on weighted unbiased effect sizes (UESs) indicated an 
interaction: When progress was measured toward long-term goals, UESs on 
global measures were higher than on probe-like outcomes; when progress 
was measured toward series of short-term goals, the reverse was true. 
Implications for special education practice are discussed. 
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Measuring Student-3 
The Effect Measuring Student Progress Toward Long vs. Short-Term Goals; 

A Meta-Analysis 

In special education, commercial norm-referenced achievement tests 
represent the traditional and predominant measurement tool for generating 
individualized instructional programs and for evaluating the effects of those 
programs (Ysseldyke tc Thurlow, 1984). Despite the prevalence of this measurement 
approach, it increasingly has been criticized (see Tindal et al., 1985; Ysseldyke 
& Thurlow, 1984). With respect to generating educational programs, critics 
contend that the abilities measured by these instruments frequently lack 
necessary conceptualization (Ysseldyke, 1979), and relatedly that the tests often 
fail to demonstrate adequate psychometric properties (Salvia & Ysseldyke, 1985). 
In terms of program evaluation, critics argue that these measures fail to; (a) 
indicate the extent to which specific educational objectives have been attained 
(Skager, 1971), (b) provide enough alternate forms to permit ongoing progress 
monitoring, (c) sample the domains of interest comprehensively (Zig^ond & 
Silverman, 1984), and (d) relate to curricular materials (Armbruster, Stevens, & 
Rosenshine, 1977; Jenkins & Pany, 1978). 

In response to these problems, ongoing criterion-referenced, 
curriculum-based assessment (CBA) strategies have been developed. With CBA, 
measurement procedures are designed to match students' program objectives. 
Alternate test forms are drawn directly from curricula specified in objectives 
and are administered at regular intervals during intervention; student progress 
data are evaluated regul arl y wi th reference to the performance criteria specified 
in objectives; and individualized programs are modified as required to insure 
attainment of objectives. Therefore, with CBA, instructional program evaluation 
is ongoing and based in the curriculum; program development is inductive, in 
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Measuring Student-4 
CBA not only is conceptually stronger than traditional assessment 
strategies. Data also indicate than it represents an effective alternative 
approach to program development and evaluation, with an average effect size 
across available controlled studies of .70 <Fuchs tc Fuchs, in press). This 
indicates that, in terms of the standard normal curve and an achievement test 
scale with a population mean of 100 and standard deviation of 15, the use of CBA 
to generate and evaluate individualized programs can be expected to raise the 
typical achievement outcome score from 100 to 110.50, or from the 50th to the 
76th percentile. 

Additionally, the requirements of federal legislation seem to indicate 
the importance of CBA: The IEP mandate of PL 94-142 requires special educators 
to specify long-t°rm goals, short-term objectives, and assessment procedures for 
monitoring students' attainment of goals and objectives. Assuming that the 
intent of this legislation was to base goals and objectives in pupils' curricula, 
then the IEP mandate requires a CBA approach to progress evaluation. 

Despite the apparent effectiveness of and seeming necessity for CBA, it 
remains unclear how practitioners should design CBA procedures to monitor 
students' attainment of goals and objectives. One reason for this lack of 
clarity stems from the IEP mandate, itself, which fails to specify whether 
student progress should be monitored toward the relatively broad goal statements 
or the more numerous and narrow objectives that typically are generated for each 
IEP goal. Currently, practitioners can select between two types of CBA, one 
focusing on the attainment of long-term goals (CBA-goal) and the other of 
short-term objectives (CBA-object i ve) . 

With the CBA-goal approach, an annual curriculum-based goal is specified 
and a large pool of related measurement items is created. From this measurement 
pool, subsets of items, or monitoring probes, are drawn randomly (see Fuchs, 
Deno, & Mirkin, 1984). The difficulty level of the monitoring probe remains 
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constant over a long time. Contrastingly, with the CBA-object iue approach, a 
series of objectives corresponding to steps within a hierarchical curriculum is 
specified, and a series of relatively circumscribed, small pools of items are 
created,, each o-f which corresponds to a specific objective (see Lindsley, 1971} 
iJhite & Haring, 1980). The difficulty level of material on which students are 
measured increases as students master the sequentially-related objectives. 

Both types of CBA are ongoing, criterion-referenced, curriculum-based, 
and enjoy strong curricular validity or correspondence between tests and 
programmatic goals and objectives (McClung cited in Popham tc Yalow, 1984). 
However, these systems do differ conceptually. CBA-object ive appears to have 
stronger instructional validity or correspondence between tests and instruction 
<Yalow & Popham, 1984). The monitoring probes for short-term measurement are 
related directly to current instructional material, so, for example, if an 
instructional intervention is introduction of the r-controlled phonics rule, the 
monitoring measure is reading r-controlled words. Alternately, with CBA-goal , 
the monitoring probes are not related to the instructional material. The 
instructional intervention may be introduction of the r-controlled phonics rule, 
whereas the monitoring measure may involve oral reading fluency, accuracy, ana/or 
comprehension on second grade passages. 

Although CCA-object i ve may enjoy stronger instructional validity, 
CBA-goal is advantageous in other respects. It possesses better content validity 
or representation of the ultimate desired performance, i.e., reading 
fluency/comprehension (Yalow & Popham, 1984). Additionally, i ts concurrent 
ualidit: or correlation with other measures of achievement appears to be stronger 
than that of CBA-object ive (Fuchs, 1982). 

The emergent question, and the focus of the current meta-analysis, is how 
well these types of ongoing criterion-referenced, curriculum-based assessment 
strategies relate to outcome measures of student achievement. The investigation 
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of this question should help practitioners assess the relative merits of the two 
types of CBA, and select CBA monitoring procedures that maximize student growth. 

Method 

Search Procedure 

The search for pertinent studies to include in the meta-analysis 
comprised four steps. First, employing the Thesaurus of Psychological Index 
Terms (APA, 1982), multiple descriptors were generated for key terms. For 
example, student achievement alternately was represented by "student progress," 
"goal attainment," and "educational effects." Second, these terms facilitated a 
computer search of three on-line data bases: (a) ERIC, a data base of educational 
materials from the Educational Resources Information Center consisting of 
abstracts from Research in Education d Current Index to Journals in Educations 
<b> Comprehensive Dissertation Abstracts: and <c) Psychological Abstracts. 
Third, employing similar key descriptors, a manual search was conducted of five 
educational journals for the years 1973 through 1983. These journals were: 
American Educational Research Journal, Journal of Learning Disabilities, 
Journal of Precision Teaching), Journal of Special Education, and Learnino 
Disabil i ty Quarterly. Fourth, the reference sections of relevant papers along 
with identified bibl iographies were explored for additional studies. 

Criteria for Relevant Studies 

A study was considered for inclusion if it employed a control group to 
evaluate the effects of curriculum-based monitoring on academic achievement. Such 
monitoring was defined as curriculum-based data collection that occurred at least 
twice weekly, with decisions concerning the adequacy of programs formulated on an 




individual, not group, basis. Studies were excluded that (a) monitored social 
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behaviors, (b) primarily -focused on the use of behavior modification, while 
employing time series to test experimental effects, <c) provided test feedback 
only to students, and/or <d) employed college age students as subjects. 

The search yiplded 29 studies that met the criteria established for 
inclusion. From these studies, 8 were eliminated because of insufficient data 
for calculat ing meta-analytic statistics. 



Data Extracted from Each Study 

Data aggregation, Guidel ines were established to ensure that each 
relevant affect was counted only once in analyses. When an effect was measured 
by different instruments or by subtests that failed to represent dimensions 
relevant to the meta-anal/sis (i.e., Reading Comprehension and Structural 
Analysis Subtests of the Stanford Diagnostic Reading Test), results from the 
instruments or subtests were pooled* For example, if achievement within a study 
were measured with three global tests and two probe-like measures, the three 
effect sizes for the global tests would be averaged as would be done for the two 
probe-like tests. So, two, rather than five, effect sizes would be included for 
such a study. 

Definition and calculation of effect size. Results of the studies were 
transformed to a common metric, effect size, defined here as the difference 
between the treatment means, divided by the control group standard deviation. 
For purposes of analysis, an effect was given a positive sign if subjects 
achieved greater scores in the systematic monitoring treatment. For studies 
reporting relevant means and standard deviations for both groups, effects sizes 
were calculated from these measurements. For studies not reporting means and 
standard deviations, effect sizes were calculated from other statistics, such as 
F or £ values (see Glass, McGaw, & Smith, 1981). 
rn 9^« Each effect size was converted to an unbiased effect size (UES) to 



9 

ERIC 

hnifliiiffnrrriaaii 



Measuring Student-8 

correct -for inconsistency in estimating true- -from observed effect sizes (Hedges, 
1981). The difference between the obsei <«ed and unbiased effect sizes was 
neglible <X = .019, SD = ,025) as has been demonstrated elsewhere 
(Bangert-Drowns, Kulik, 4 Kulik, 1983). Nevertheless, UESs were employed to 
insure the mathematical tractability of the data. 

There were 96 effect sizes, with between 1 and 12 effect sizes per study. 
Analyses indicated no statistical dependency between effect size magnitude and 
number of comparisons per study < r =,12 ). Therefore, UESs were aggregated at 
the individual effect size level. In combining these UESs, weighted averages 
were calculated to account for the variances of the UESs (see Hedges, 1984). 

Study features. To describe study features pertinent to the current 
investigation, two major substantive variables were identified and coded for each 
study. The first study feature was type of type of goal . This variable had two 
levels that differentiated studies in which progress toward long-term goals 
(CBA-goal) was monitored from studies in which progress toward a short-term 
objective or a series of short-term objectives <CBA-object i ve) was monitored. 

Studies in which progress toward lung-term goals was monitored involved 
the specification of a level of material on which a student was expected to be 
proficient within the next 15 or more weeks. For example, for a student 
currently reading proficiently on primer material, a student's goal might specify 
that, in 25 weeks, a student would read 75 words per minute correct with 90X 
accuracy on second grade reading passages. Then, for the next 25 weeks, 
measurement probes would be randomly sampled from the second grade reading 
passages, representing approximately equivalent samples of measurement material. 

Studies in which progress toward short-term goals was monitored required 
the identification of a sequence of small segments in a hierarchical curriculum 
to be mastered by the student. For example, the series of objectives might 
specify that the student would read, with 90X accuracy, flashcards first with 
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consonant-vowpl-consonant words, second with -final e words, and thiru with double 
vowel words. Proceeding in a -fashion parallel to the specification of 
objectives, measurement probes -first would be drawn -from flashcards with 
consonant-vowel-consonant words until the mastery criterion was achieved by the 
student on that domain. Theu, the measurement domain would change so that probes 
we^<* -flashcards with -final e words, and so on. 

The second study -feature was outcome measure. This variable also had two 
levels; dependent" measures similar to the monitoring probes* and more global 
achievement te^ts. Employing the examples provided above, probe-like outcome 
indices wer<? ora 1 . reading rate on second grade passages or percentage read 
correctly -f ro m -flashcards with -final e words; global achievement tests were the 
Structural Analysis and Reading Comprehension Subtests o-f the Stan-ford Diagnostic 
Reading Test. 

In addition to these two substantive -features, a third, methodological 
variable was coded -for each study, duration o-f the treatment. This variable had 
three levelss treatments implemented -for less than 3 weeks (coded M*>; 
treatments lasting between 3 and 10 weeks (coded m 2 m )\ and treatments continued 
•for more than 10 weeks (coded *3 M ). A previous investigation (Fuchs & Fuchs, in 
press) explored methodological quality of the studies and identified no relation 
between effect sire magnitude and study quality. 

Two raters independently coded 10 of the 21 studies (48/0. Percentage of 
agreement 1 for the raters on type of goal, outcome measure, and duration of 
treatment, respectively, was .90, .80, and 1,00. 

Characteristics of the Sample 

Of the 23 references listed in the Appendix, which represent 21 separate 
2 

investigations, there are 4 dissertations, 11 unpublished studies, and 8 journal 
articles. Among the published papers, 3 appeared in Exceptional Children, 2 in 
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American Educational Research Journal, and 1 each in Teaching Exceptional 
Children, American Journal of Mental Deficiency« and Journal of Precision 
Teaching. A total o-f 3835 subjects participated in these studies, with 83% if 
the investigations employing handicapped subjects, O-f these handicapped pupils, 
98% were mildly to moderately handicapped and Z'. were severely handicapped. The 
grade level o-f these subjects ranged -fror. preschool through high school, with a 
median grade level o-f 3.8. Among the 21 investigations, 8 <38%) focused solely 
on the academic area of reading, 4 <19X) on reading ^nd math, 3 <14/0 only on 
math, and 1 <5X) each on high school content areas, <b) preschool skills, <c) 
spelling, <d) math and spelling, <e) reading, math, and spelling, and <f) 
writing, math, and spelling. 



Resul ts 
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Of the 96 effect sizes, 27 related to long-term goal measurement and 69 
to short-term goal measurement. Of the 27 long-term goal effect sizes, 14 were 
associ ated wi th proble-like and 13 with global outcome measures. Of the 69 
short-term goal effect sizes, 37 were related to probe-liKe and 32 to global 
outcome measures. 

Relation between treatment duration and other effect size features. A 
pair of t, tests was run to determine whether measurement goal or outcome measure 
was related to the duration of treatment. These tests indicated no statistically 
significant associations. For the long-term goal effect sizes, the mean coded 
level of treatment duration <see above) was 2.92 <SD a .27); for the short-term 
goal effect sizes, 2.75 <SD = .46), i <95) a 1.81, ns^ The average level of 
treatment duration for effect sizes associated wi th probe-like and global outcome 
measures, respectively, were 2.78 <SD = .51) and 2.76 <SD = .23), i <95) = .24, 
ns. The absence of a relation between treatment duration and type of measurement 
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goal or dependent measure permits a relatively straightforward interpretation of 
the analyses presented below. 

Relation between effect size magnitude and effect size -features. Table 1 
displays the weighted UESs by <a) the type of goal factor (long-term goal vs. 
short-term objective) and <b) the outcome measure factor (probe-like vs. global 
achievement test). To examine the relation between these variables and effect 
«:ize magnitude* Hedges's (1984) analogue to analysis of variance was employed. 
When conventional analysis of variance is conducted on effect sizes, problems 
exist because of the possibility that systematic variance will be pooled into the 
estimate of error variance. Moreover, violation of the homoscedastic i ty 
assumption is severe in research .syn thesi s, and there is little reason to believe 
that the usual robustness of the F test will prevail (see Hedges, 1984). Thus, 
Hedges's analogue to analysis of variance was employed to avoid these conceptual 
and statistical problems. As indicated in Table 1 neither factor produced a 
statistically significant difference in the UESs. 



Insert Table 1 about here 



Nevertheless, additional analyses suggested the presence of an 
interaction between type of goal and outcome measure. Specifically, the effect 
of the type of outcome measure was analyzed within each of the type of goal 
conditions. As shown in Table 2 and Figure 1, within the type of goal 
conditions, there were statistically significant differences between UESs 
associated wi th the proble-like and the global outcome measure - . With 
CBA-objecti ve, UESs associated with probe-ltke outcome measures were higher than 
those of global measures. For CBA-goal, the reverse was trues UESs associated 
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with global measures were higher than those related to probe-like outcome 



measures. 
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Insert Table 2 and Figure 1 about here 



Discussion 

The purpose o-f this meta-analysis was to investigate how well measuring 
progress toward long- vs. short-term goals relate to contrasting outcome measures 
o-f student achievement. Toward this end, a literature search was conducted, 
resulting in the identification o-f 21 relevant studies that provided su-f-ficient 
in-formation -for the calculation o-f meta-analytic statistics. These studies were 
coded -for long-term vs. short-term goal measurement and tor probe-like vs. global 
outcome achievement measures. To investigate a possible con-found inherent in 
such a study, that short-term and long-term goal measurement or probe-like and 
global achievement measures might be related to the duration o-f the experimental 
treatment, study durations also were coded. Analyses indicated no reliable 
association between either substantive variable and treatment duration. 

Analgues to analysis o-f variance indicated that the magnitude o-f e-f-fect 
size was not related either to the type o-f outcome measure employed or to the 
type o-f goal on which monitoring occurred. However, additional analyses 
suggested an interaction: When progress was measured toward long-term goals, 
e-f-fect sizes on global outcome measures were higher than on probe-like outc*/aes. 
On the other hand, when progress was measured toward series o-f short-term goals, 
e-f*ect sizes were greater on probe-like than on global outcome measures. 

This -finding may be explained in terms o-f the types o-f validity 
associated wi th the di-f-fers-ni goal measurement strategies. With long-term goal 
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measurement, instructional validity is relatively poor whereas content and 
concurrent validity may be comparatively strong. For example, a student might be 
measured, over a year-long period, on oral reading rate and accuracy in material 
one year above instructional level. Such measures clearly are unrelated to 
instructional activities, but have been shown to correlate well with global 
measures of reading skills, including tests of decoding, word recognition, and 
comprehension (Deno, MirKin, £c Chiang, 1982; Fuchs, 1981), There-fore, it is not 
surprising that, in this study, measuring student progress toward long-term goals 
was associated more strongly with global achievement outcome measures than to 
more narrow measurement probes. 

On the other hand, with short-term goal measurement, instructional 
validity is relatively high while content and concurrent validity may be 
comparatively limited. For example, a student might be measured, over a 
year-lcng period, on a series of short-term objectives, each of which is related 
clearly to the current instructional material. However, Quilling and Otto <1971> 
■found that mastery o-f a hierarchy o-f reading decoding skills related 
inconsistently to global indices o-f reading achievement. There-fore, it is not 
surprising that, in this investigation, measuring student progress toward 
short-term goals was associated more strongly with performance measures similar 
to the probes on which monitoring occurred than with global achievement measures. 

Teachers may prefer short-term goal measurement because it is easier to 
understand and it guides . struction more directly by providing information about 
when to progress -from one skill to another (Fuchs, Wesson, Tindal, Mirkin, & 
Deno, 1982). Nevertheless, as demonstrated in this meta-analysis, short-term 
goal measurement may be misleading: While students master a series o-f 
instructional objectives, progress on more global indices o-f achievement may be 
limited, failing to reflect this gain. Additionally, practitioners freqently 
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specif icy long lists of short-term objectives on IEPs <Gi 1 1 *sp ie-Si 1 ver , 
Schachter, & Warren, 1980), a phenomenon that can make short-term objective 
measurement more cumbersome and time-consuming than long-term goal measurement; 
Short-term objective monitoring may require teachers to adapt measurement probes 
and procedures more often. 

The -finding that long-term goal monitoring relates better to global 
achievement outcome measures may be especially important in the education of 
handicapped students, who typically have poorly developed strategies for 
maintaining and transfering skills (Anderson-Inman , Walker, & Purcell, 1984; 
White, 1984). Short-term goal measurement focuses or instruct ional ly related, 
relatively restricted domains of material for a period of time and then, upon 
mastery of that material, the measurement and instructional focus simultaneously 
changes. Such a paradigm may be problematic for at least two reasons. First, a 
close connection between instruction and measurement may encourage teachers to 
present new skills to students within the framework of the measurement task. For 
example, if the measurement procedure requires the pupil to read 
consonant-vowel -consonant words from a list, the teacher may focus instruction on 
reading consonant-vowel-consonant words from a list. As noted by Goodstein 
(1982), there may be danger in tying the instructional format too closely to the 
assessment device or of narrowly defining content-x-f ormat domains of 
criterion-referenced assessment. Such a restricted instructional format may 
limit the transfer of skills. 

Second, simultaneously changing instructional focus and measurement 
domain may fail to encourage teachers to review material sufficiently to allow 
for long-term skill maintenance and generalization. A more global, long-term 
goal approach to measurement , which still is rooted in the curriculum and is 
criterion-referenced, may encourage teachers to encorporate instructional 
procedures that better allow for maintenance and generalization of skills. 
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Findings may be relevant not only to the development o-f systematic, 
continuous progress evaluation procedures but also to teachers' less -formal 
monitoring strategies, including the periodic use o-f commercial 
cri ter i on-re-f erenced measures such as basal series mastery tests and the Brigance 
(1978). Data generated -from periodic administrations o-f such instruments, where 
test domains are tied closely and narrowly limited to the instructional -focus, 
may -fail to relate to global academic progress. There-fore, teachers might exert 
caution as they interpret such data bases. 

Finally, a summative comment seems warranted. As practitioners develop 
their programmatic or IEP goal and objective statements and their related 
curriculum-based assessment procedures -for monitoring pupil progress toward those 
goals and objectives, it seems important -for them to keep in mind the distinction 
between curricular and content validity. Curricular validity re-fers to the match 
between testing and IEP goals and objectives; content validity, the 
correspondence between testing and the true domain in which pro-ficiency is 
desired (Yalow & Popham, 1983). It is only when prac t i t i oners wr i te "sign i-f icant 
rather than trivial" (Popham et al . , 1985) IEP goals and objectives, which relate 
well to the true desired outcome per-formance , that curricular and content 
validity o-f curr i ul um-based assessment are both strong. It is only under these 
conditions that "measurement-driven instruction" (Popham et al., 1985), or 
ongoing assessment o-f pupils progress to guide instructional planning, has an 
important, global e-f-fect on pupil achievement. This, together with -findings o-f 
the current meta-analysis, suggest that curriculum-based assessent o-f long-term 
goals, which accurately re-flect the desired outcome per-formance > may represent a 
better strategy -for monitoring pupil progress than the assessment o-f narrowly 
circumscribed short-term objectives. 
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Footnotes 

Vercentage of agreement was calculated using the -following formula 
(Coulter cited in Thompson, White, & Morgan, 1982): Percentage of agreement = 
agreements between observer A & observer B/(agreements between A & B + 
disagreement between A tc B + omissions by A + omissions by B. 

2 

One paper authored by Haring (1971) and two additional reports by Haring 
& Krug (1975a, 1975b) described aspects of the same investigation. Therefore, 
although it is reported that 21 studies were employed in the meta-analysis, 23 
appear in the Appendix due to the separate listing of the Haring and the Haring 
and Krug papers. 
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Table 1 



Weighted Mean UESs, 


z Val ues, and 


Chi-Square Statistics as Analogues to 


Analysis o-f Variance by Type 


o-f Goal and 


Outcome Measure Factors 




Factor 


Weighted X 


z Val ue° 


N b ? di 


Type o-f Goal 






96 .69 1 


Long-term 


.63 


16.58 


27 


Short-term 


.67 


24.82 


69 


Outcome Measure 






96 6.63 1 


Probe-1 i ke 


,72 


23.23 


45 


Global 


.61 


19.06 


51 



A sign-ficant z value irdicates that the weighted mean is reliably di-f-ferent 
■from zero. All z values are significant beyond the .001 level. 

b 

N represents number o-f UESs not number o-f studies. 
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Table 2 

Weighted Mean UESs, z Values, and Chi-Square Statistics as Analogues to 
Analysis of Variance for Probe-Like and Global Outcome Measures within 

Type of Goal Conditions 



Type of Goal/ 

Outcome Measure Weighted X z Value N X*~ 



Short-Term Goal 










Outcome Measure 






69 


56.78* 1 


Probe-Like 


.85 


22.97 


37 




Global 


.45 


11 .54 


32 




Long-Term Goal 










Outcome Measure 






27 


41.59 C 1 


Probe-Li ke 


.41 


7.32 


14 




Global 


.92 


16.73 


13 




a 

A significant z. value indicates 


that the we 


ighted mean is 


rel i 


ably different 


from zero. All z values are s 

b 


ignif icant 


beyond the .001 


probabi 1 i ty 1 evel 


N represents number of UESs not 


number of 


studies. 







C £ < -001 . 
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Figure Caption 

F'Qure 1 . Unbiased mean effect sizes <UESs> for CBA-objecti ve < ) 

and CBA-goal < ) on probe-like and global outcome measures. 
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